Conjugation-based compression for Hebrew texts
نویسندگان
چکیده
منابع مشابه
Querying Hebrew Texts via Word Spotting
We report on recent results with word-spotting (WS) in Hebrew historical texts, manuscript and printed. The advantage of such a retrieval system is that it works on images without any need for manual or computer transcription of the texts. The method allows for extremely rapid querying, while still maintaining high accuracy; thus, it should be considered as an important tool in historical textu...
متن کاملA Morphological, Syntactic, and Semantic Search Engine for Hebrew Texts
This article describes the construction of a morphological, syntactic and semantic analyzer to operate a high-grade search engine for Hebrew texts. A good search engine must be complete and accurate. In Hebrew or Arabic script most of the vowels are not written, many particles are attached to the word without space, a double consonant is written with one letter, and some letters signify both vo...
متن کاملCHAT: A System for Stylistic Classification of Hebrew-Aramaic Texts
1. Objectives CHAT, is a fully self-sufficient system for pre-processing, vectorizing and categorizing Hebrew-Aramaic texts. CHAT is designed to work with Bar-Ilan's corpus of Hebrew-Aramaic texts incorporating over 128 million words spanning more than two millennia. The kinds of problems that are of interest for this system do not concern categorization by topic but rather a number of scholarl...
متن کاملStatic Compression for Dynamic Texts
Two problems arise when semi-static word-based compression methods are applied to large texts, such as those stored in information retrieval systems. First, the space required for the model during decoding can become very large. Second, the need to handle document insertions means that the collection must be periodically recompressed if compression eeciency is to be maintained. Here we show tha...
متن کاملCompression of Parallel Texts
The world-wide use of digital storage and communications devices is increasing the need to make texts available in multiple languages. To minimise the cost of storing and transmitting multiple translations of a text, one could store the text in just one language, from which other translations can be created. Unfortunately, the quality of machine translation techniques is not good enough for thi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Asian Language Information Processing
سال: 2007
ISSN: 1530-0226,1558-3430
DOI: 10.1145/1227850.1227854